Feat: Various Configurable Initializations #161

flxst · 2024-06-25T11:49:37Z

What does this PR do?

This PR implements the following weight initializations (see https://arxiv.org/abs/2312.16903):

plain (= all weights normally distributed)
scaled (= same as plain, but narrower distribution for projection weights W0 & W2)
scaled embed (= same as scaled, but wider distribution for embedding)

A weight initialisation component is introduced which modifies the model weights in place (see #168 for more details)

General Changes

Components and factories for plain, scaled and scaled_embed initialisation.
in GPT2 model training configs, the standard deviation std can now be set to the string auto (in which case it will equal sqrt(2/(5*hidden_dim)), see e.g. https://arxiv.org/abs/2312.16903)
The CoCa model, which previously used a hardcoded, (probably not entirely correct) scaled initialization (see Limited and potentially incorrect weight initialization for CoCa model #165), can now only use plain initialization

Breaking Changes

All training configs require an additional component for initialization of the raw model (i.e. the model with random weights), as shown here.

Checklist before submitting final PR

My PR is minimal and addresses one issue / enhancement in isolation
I have merged the latest version of the target branch into this feature branch
I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
I have run a sample config for model training
I have fixed all failing tests (python tests/tests.py)

… class

…ion (gpt2 only)

src/modalities/models/coca/coca_model.py

src/modalities/models/model.py

le1nux

Nice work! I think the implementation is correct (checked with the paper). I only had 2-3 questions where things were not clear to me. Regarding the integration into modalities, I think we need more iterations fixing some issues about the dependencies and coupling.

I'm not the biggest fan of having the initialisation in the parent class of the models (i.e., NNModel).
The reason is that the weight initialisation is pretty much dependent on the concrete model implementation. For instance, the gpt2 model must have the c_proj parameters initialised in certain ways, whereas CoCa for instance has different named parameters. Currently, we do string matching in the parent class which introduces low level, model-specific dependencies in the parent class. Similarly, this can be the case for modules, e.g., a custom linear layer that we introduce in a particular model. Also in this case, we would need custom initialisation in the parent class. For each of these special cases we would have to modify the parent class, strongly coupling dependencies.

We could resolve this inverse dependency by introducing an generic WeightInitializer class, that initializes the weights of a model in a generic, configurable way. The WeightInitializer class would be passed to the constructor of the concrete model. The model would then call something like weight_initalizer.init_weights(self, weight_init_config). Basically, the strategy pattern that modifies the calling object (here, the model) in place.

We should have a generic WeightInitializer covering the general cases. For specific models, we can also introduce new WeightInitializers that are specific to a certain model. These WeightInitializers should be instantiable as part of the hierarchical instantiation, as we do for the other components.

The config for the model and WeightInitializer would look like this:

weight_initializer:
   <weight init config...>

model:
  component_key: model
  variant_key: gpt2
  config:
    n_layer: 2
    n_head_q: 8
    n_head_kv: 4
    ffn_hidden: 128
    weight_initializer:
              instance_key: weight_initializer
              pass_type: BY_REFERENCE

Additionally, I left a bunch of mostly minor comments regarding some questions and ideas.

src/modalities/models/model.py

…lasses

…class

…rcular imports

… init. Need to check how we want to handle this case

…or embedding during initialization

…rameterwiseNormalInitialization to support also plain initialisation

…d parameters

…n to scaled.

…en std is of type float

…egistry.

Draft: Feat/initialization component

le1nux

LGTM :)

src/modalities/nn/model_initialization/initialization_routines.py

test: scaled initialization for gpt2

f75f9dd

flxst self-assigned this Jun 25, 2024

flxst marked this pull request as draft June 25, 2024 11:50

flxst added 5 commits June 25, 2024 14:59

test: scaled initialization for coca

16e31ab

test(initialization): scaled initialization for gpt2/coca: improvements

c2b44c9

feat: introduce configurable initialization (plain, scaled)

c332fe3

refactor: move weight initialization for gpt2 & coca to common parent…

5a2d299

… class

feat: introduce auto option for weight initialization standard deviat…

5ae48e5

…ion (gpt2 only)

flxst mentioned this pull request Jun 26, 2024

Limited and potentially incorrect weight initialization for CoCa model #165

Closed

flxst added 2 commits June 26, 2024 21:29

feat: introduce scaled_embed initialization

6376b0a

docs: add weight initialization to features in README

ec520f4

flxst marked this pull request as ready for review June 27, 2024 13:22

le1nux self-requested a review June 27, 2024 14:29

le1nux added the enhancement New feature or request label Jun 27, 2024

le1nux reviewed Jun 28, 2024

View reviewed changes

src/modalities/models/coca/coca_model.py Outdated Show resolved Hide resolved

le1nux reviewed Jun 28, 2024

View reviewed changes

src/modalities/models/model.py Outdated Show resolved Hide resolved

le1nux requested changes Jun 28, 2024

View reviewed changes

le1nux added 13 commits June 30, 2024 17:17

feat: added Weight Initialization Factory

53616e0

feat: implemented model-wise and named-parameter-wise initalization c…

03b7a7f

…lasses

feat: drafted the init configs

d2ecf0d

chore: added missing init file

0efd32d

refactor: removed previous initialization code in the abstract model …

dcafa38

…class

feat: added initialization config classes

83ce570

refactor: moved WeightInitializationIF to separate file to prevent ci…

ce48a57

…rcular imports

feat: added WeightInitializerWrapperConfig to config

a08d4d9

feat: added WeightInitializerWrapper

c39eabd

feat: wired up all the weight initializers as components

2af2625

feat: added functionaliy to initalize model weights

a950746

feat: added the weight init to config lorem ipsum

416831a

refactor: removed raising an excpetion when module is not covered for…

86308d4

… init. Need to check how we want to handle this case

le1nux added 2 commits July 1, 2024 15:19

fix: replaced std of 0.4 by math.sqrt(0.4) for scaled_embed

7d26675

chore: added citation

bb4cd36

le1nux mentioned this pull request Jul 1, 2024

Draft: Feat/initialization component #168

Merged

5 tasks

le1nux and others added 18 commits July 2, 2024 11:19

refactor: added edge case handling when module is not of type linear …

48dced9

…or embedding during initialization

refactor: removed ModulewiseNormalInitialization and extended NamedPa…

93a0cb4

…rameterwiseNormalInitialization to support also plain initialisation

refactor: removed old weigh_init from coca config

c2817f7

refactor: all weight init code is now based on regex matching on name…

98fe633

…d parameters

feat: added plain weight init for CoCa

b8e42c1

refactor: renamed plain_std -> std

054d6a2

refactor: renamed plain_std -> std (missed one instance)

602336c

refactor: some renamings and passing now the calculated std from plai…

824da6a

…n to scaled.

refactor: removed legacy NamedParameterwiseNormalInitializationConfig

4573364

refactor: simplified the initialization structure and improved naming

0e29447

refactor: for plain init we do not allow hidde_dim to be specified wh…

eea99bc

…en std is of type float

refactor: removed currently not needed init routines from component r…

d7c0825

…egistry.

test: initialization unit test adjustments (first steps)

2c50e06

test: initialization unit test adjustments (config fix)

96c0657

test: fix coca config

d3c1683

feat: added plain filters for coca

1a61bf6

test: readd test for initialization

003eb99

Merge pull request #168 from Modalities/feat/initialization_component

9f5651b

Draft: Feat/initialization component

le1nux requested review from le1nux and mali-git July 3, 2024 14:16

le1nux approved these changes Jul 3, 2024

View reviewed changes

mali-git requested changes Jul 8, 2024

View reviewed changes

refactor: Added references and minor improvements

f549392

mali-git approved these changes Jul 9, 2024

View reviewed changes

le1nux merged commit 0b8dfc0 into dev_experiments Jul 9, 2024

le1nux deleted the feat/initialization branch July 9, 2024 08:40

le1nux mentioned this pull request Jul 9, 2024

Towards stable modalities version #141

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Various Configurable Initializations #161

Feat: Various Configurable Initializations #161

flxst commented Jun 25, 2024 •

edited

Loading

le1nux left a comment

le1nux left a comment

Feat: Various Configurable Initializations #161

Feat: Various Configurable Initializations #161

Conversation

flxst commented Jun 25, 2024 • edited Loading

What does this PR do?

General Changes

Breaking Changes

Checklist before submitting final PR

le1nux left a comment

Choose a reason for hiding this comment

le1nux left a comment

Choose a reason for hiding this comment

flxst commented Jun 25, 2024 •

edited

Loading